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(57) ABSTRACT 
A method and system for incorporating a users' preferences 
in an information clustering system, the user-configurable 
information clustering system comprising an information 
clustering engine for clustering of information based on 
similarities, a user interface module for displaying the 
information groupings and obtaining user preferences, a 
personalization module for defining, labeling, modifying, 
storing and retrieving cluster structure, and a knowledge 
base where the user-defined cluster structure is stored. 
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METHOD AND SYSTEM FOR 
USER-CONFIGURABLE CLUSTERING OF 
INFORMATION 

FIELD OF THE INVENTION 

[0001] This invention relates to the field of pattern pro- 
cessing and information management and more specifically 
to methods and systems for incorporating user interests and 
preferences in automated information clustering. Related 
fields of invention include information database content 
management, self-organizing information databases, docu- 
ment clustering, and personalized systems. 

BACKGROUND OF THE INVENTION 

[0002] Categorization and clustering have been two fun- 
damental approaches to information organization and infor- 
mation database content management. 

[0003] Categorization or classification is supervised in 
nature. A user defines a fixed number of classes or catego- 
ries. The task is to assign a pattern or object to one or more 
of the classes. Categorization provides good control in the 
sense that it organizes the information according to the 
structure defined by the user. However, due to the predefined 
structure, categorization is not well suited to handling novel 
data. In addition, much effort is needed to build a catego- 
rization system. It is necessary to specify classification 
knowledge in terms of classification rules or keywords 
(disclosed in U.S. Pat. No. 5,371,807) or to construct a 
categorization system through some supervised learning 
algorithms (disclosed in U.S. Pat. Nos. 5,671,333 and 5,675, 
710). The former requires knowledge specification (e.g., 
written classification rules) and the latter requires example 
annotations (i.e. labeling information). Both are labor inten- 
sive. 

[0004] Clustering is unsupervised in nature. For unsuper- 
vised systems (U.S. Pat. Nos. 5,857,179 and 5,787,420), 
there is no need to train or construct a classifier since 
information is organized automatically into groups based on 

similarities. Hnwy.ver, a nrer hay very |iule rnnlp 1 OVf r h™" 

the inforjna jion is grouped together . Although it is possible 
to fine tune the parameters of the similarity measures to 
control the degree of coarseness, the effect of changing a 
parameter cannot be predicted; changing one parameter 
could affect all clustered results. In addition, the structure 
established through the clustering process is unpredictable. 
Whereas clustering is acceptable for a pool of relatively 
static information, in situations where new information is 
received every day, information with similar content may be 
grouped (based on different themes) into different clusters 
on different days. This ever-changing cluster structure is 
highly undesirable for the user who is navigating the infor- 
mation database to find desired information. Imagine the 
frustration of reading a newspaper with a different layout 
every day! U.S. Pat. No. 5,911,140 attempts to provide a 
solution by ordering document clusters based on user inter- 
ests. However, the cluster ranking relies on the availability 
of the ranking of each document in the clusters and only very 
minimal user preferences are taken into account. 

SUMMARY OF THE INVENTION 

[0005] In order to overcome the various shortcomings of 
systems which effect only categorization or clustering of 



information with little or no account taken of user prefer- 
ences, this invention provides a method and system that 
incorporate users' preferences in an information clustering 
system. In general, this system allows a user to create a 
cluster structure and influence or personalize the cluster 
structure by indicating his or her own preferences as to how 
information should be grouped. This invention further 
allows the user to store the cluster structure and subse- 
quently retrieve it for future use. 

[0006] The user-configurable information clustering sys- 
tem comprises an information clustering engine for cluster- 
ing of information based on similarities, a user interface 
module for displaying the information groupings and obtain- 
ing user preferences, a personalization module for defining, 
labeling, modifying, storing and retrieving cluster structure, 
and a knowledge base where a user-defined cluster structure 
is stored. 

[0007] According to the invention, each unit of informa- 
tion is represented by an information vector. A user prefer- 
ence, indicating a preferred grouping for the corresponding 
unit of information, can be represented by a preference 
vector. In addition, information, which may be in the form 
of a database, is supplied to the user configurable informa- 
tion clustering system by any well known means within the 
art. 

[0008] In the preferred embodiment, the information clus- 
tering engine is a hybrid neural network comprising two 
input fields F/ and F^ with an F 2 cluster field. The Fj a field 
serves as the input field for the information vector A. The Fj b 
field serves as the input field for the preference vector B. The 
F 2 field contains a plurality of cluster nodes, each encoding 
a template information vector w^ 8 and a template preference 
vector Wj b . Given an information vector A with an associated 
preference vector B, the system first searches for an F 2 
cluster J encoding a template information vector Wj° that is 
closest to the information vector A according to a similarity 
function. It then checks if the associated F 2 template pref- 
erence vector w. b of the selected category matches with the 
input preference vector. If so, the templates of the F 2 cluster 
J are modified to encode the input information and prefer- 
ence vectors. Otherwise, the cluster is reset and the system 
repeats the process until a match is found. 

[0009] Through the user interface and the personalization 
module, the user is able to influence the cluster structure by 
indicating his own preferences in the form of preference 
vectors. The user can create a new cluster, label an existing 
cluster, and/or modify cluster structure by merging and 
splitting clusters. In addition, the resulting customized clus- 
ter structure can be stored in the cluster structure knowledge 
base and retrieved at a later stage for processing new 
information. 

[0010] In one embodiment of the invention, a system and 
method are provided for customizing the organization of an 
existing set of information according to the user's knowl- 
edge and preferences. In another embodiment, a system and 
method are provided for creating a cluster structure auto- 
matically and subsequently modifying this machine-gener- 
ated cluster structure according to the user's preferences. In 
yet another embodiment, a system and method are provided 
for detecting new information and analyzing trends wherein 
the user, through repeated personalization of the cluster 
structure, identifies new information and trends previously 
unknown to him or her. 
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[0011] The disclosed invention is more flexible than a pure 
categorization system in which information must be 
assigned to one or more pre-defined categories or groups. Al 
the same time, it is more flexible than a pure clustering or 
self-organizing system in which information is grouped 
according to similarities but the user has very little control 
over how the information is organized. 

[0012] The invention has a number of advantages over the 
prior art: The invention performs clustering or self -organi- 
zation of information based on similarities in content, i.e. 
similarities in information vectors. The information can be 
automatically organized without user training or prior con- 
struction of a classifier. The invention allows the user to 
correct or change the organization of the information as 
necessary. The invention also allows the user to intervene in 
the organization of the information both globally and locally. 
Further, the invention allows the user to control the coarse- 
ness of the information groupings without tuning the param- 
eters of complex similarity functions. The invention also 
allows the user to indicate directly how specific units of 
information are to be organized. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0013] Embodiments of the invention will now be 
described by way of example with reference to the accom- 
panying drawings in which: 

[0014] FIG. 1 illustrates an embodiment of a user-config- 
urable information clustering system according to the 
present invention. 

[0015] FIG. 2A shows an exemplary architecture for the 
information clustering engine of FIG. 1 for clustering infor- 
mation in response to user preferences. Information is 
encoded as information vectors. User preferences are 
encoded as preference vectors. The field serves as the 
input field for the information vectors. The F a b field serves 
as the input field for the preference vectors. Information 
clusters (represented as FJ are formed through the synchro- 
nized clustering of the information and preference vectors. 

[0016] FIG. 2B illustrates the category choice process of 
the information clustering engine of FIG. 1. 

[0017] FIG. 2C illustrates the template matching and the 
template learning process of the information clustering 
engine of FIG. 1. 

[0018] FIG. 3 illustrates an exemplary flow diagram for 
incorporating user preferences in the information clustering 
process. 

[0019] FIG. 4 illustrates an exemplary set of personaliza- 
tion functions that a user could use while performing infor- 
mation clustering. 

DETAILED DESCRIPTION OF THE 
INVENTION 

[0020] The invention is concerned with the organization of 
information, based on similarities in content, i.e. similarities 
in information vectors, according to a user's knowledge and 
preferences. The information comprises text, image, audio, 
video, or any combinations thereof. According to the inven- 
tion, each unit of information (<I>), defined as an individual 
element of information, may be any object, for example a 
document, person, company, country, etc., and can be rep- 



resented by a complement coded 2M -dimensional informa- 
tion vector A of attributes or features, 

[0021] where a £ is a real-valued number between zero and 
one, indicating the degree of presence of attribute i, and 
a^-l-aj. Complement coding represents both the on-re- 
sponse and the off-response to an input vector and preserves 
amplitude information upon normalization. 

[0022] In tbe case of documents, for example, the features 
in the information vectors could be token words commonly 
known as keywords. For example, in the exemplary case 
where the information is a business document, keywords 
may include "share", "market", "stock", "acquisition", 
"trading" etc. The feature sets can be predefined manually or 
generated automatically from the information set. 

[0023] User preferences are represented by preference 
vectors that indicate the preferred groupings of the infor- 
mation. A preference vector B is a complement coded 
2N-dimensional vector defined as 

B-(b, fi>(6„ . . . , *nA c , ■ ■ . , V) (2) 
[0024] where b k is either zero or one, indicating the 
presence or absence of a user-defined class label 1^, 
and b k c "1-bk. A user's knowledge comprises that informa- 
tion acquired by one in the user's profession, community, or 
field of endeavor or study. Also, this knowledge may include 
highly specialized information developed, acquired or used 
based on a user's unique experiences. An example of user 
knowledge includes a physician's knowledge of diseases, 
symptoms for those diseases, and appropriate treatments, 
including appropriate drugs; knowledge of this type from 
many practitioners could be placed in a database which 
could increase in content as new diseases were discovered, 
treated and cured. Using the invention, a physician could 
configure the database and additions thereto over time 
according to her own experience and specialty. Another 
example of user knowledge is a market analyst's under- 
standing of trends in a given manufacturing sector over a 
given period of time; a database of wire news reports over 
a given period or updated on a periodic basis may contain 
information of use to the market analyst. Using the inven- 
tion, the market analyst could configure such a database and 
updates thereto according to the particular sector of interest 
to him. Yet another example of user knowledge is a chefs 
knowledge of the cooking arts; a database of culinary dishes, 
ingredients, foods, etc., which may be intermittently modi- 
fied, contains information of use to the culinary field. Using 
the invention, the chef can, for example, create groupings of 
ingredients for culinary dishes of interest to her. Another 
example of user knowledge is the preferred grouping of 
news articles by a journalist. A journalist, depending on his 
target readers, may organize articles from local and foreign 
sources into specific groupings that would be most appro- 
priate for his purposes. For example, a Singapore journalist 
might organize foreign news into threads that are of interest 
to Singaporeans, such as "Michael Fay Event", "Singapore 
Economics" etc. Other examples will be evident to those 
skilled in the art. In addition, the user's preferences are 
derived from, for example, his personal or professional 
informational desires, organizational goals and objectives, 
analytical training and skills, interpretational biases, and 
experiences with identification of information. User prefer- 
ences include informational and organizational objectives 
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useful to organizing and increasing the knowledge base of 
the user. For example, in the case of the Singapore journalist 
mentioned above, preferences used in organizing the articles 
are based on the journalist's interpretation and experience on 
the relevancy of such topics to his readers and the volume of 
articles on such topics over time. 

[0025] Units of information which share common 
attributes or content, as indicated by their information vec- 
tors, are said to be similar to each other. Units of information 
which share few or no common attributes or content are said 
to be dissimilar to each other. The system may automatically 
determine similarity between two units of information based 
on the application of a similarity function. Alternately, the 
user may determine the degree to which units of information 
appear to be similar to each other based on his or her 
knowledge and preferences. 

[0026] The disclosed method can be executed using a 
computer system, such as a personal computer or the like, as 
is well known in the art. The disclosed system can be a 
stand-alone system, or it can be incorporated in a computer 
system, in which case the user interface can be the graphical 
or other user interface of the computer system and the cluster 
structure knowledge base can be, for example, a file in any 
of the computer system's storage areas, elements or devices. 

[0027] Referring to FIG. 1, there is provided a user- 
configurable information clustering system 10 comprising 
an information clustering engine 20, a personalization mod- 
ule 50, a cluster structure knowledge base 14, and a user 
interface module 16. Information in the form of information 
and preference vectors is supplied to the clustering engine 
20, which comprises a hybrid neural network model that 
performs a hybrid of supervised and unsupervised learning. 
The neural network may be conventional. For example, an 
ARTMAP system or an ARAM system, such as described in 
"Adaptiv e Resonance A ssociative Map", published in "Neu- 
ral Networks", \bT§Tslo. 3, pp. 4 J7^o^T9'g5) 7wh"iChiS 
incorporated herein by reference, can be used. The user 
interface module 16 may comprise a graphical user inter- 
face, keyboard, keypad, mouse, voice^o mmand recog nition. 
/ * svstem ^oranv combination the reof.and may permit graphi- 

te ^} cal visualization of information groupings. The cluster struc- 

ture knowledge base 14 may be any conventional recordable 
storage format, for example a file in a storage device, such 
as magnetic or optical storage media, or in a storage area of 
a computer system. 

[0028] The user-configurable information clustering sys- 
tem 10 allows a user to personalize or influence the cluster 
structure by indicating his or her own preferences in the 
form of preference vectors. Through the user interface 
module 16 and the personalization module 50, the user is 
able to create a customized cluster structure by selective 
and/or repeated application of the following: creating a new 
cluster, labeling an existing cluster, and modifying cluster 
structure by merging and splitting clusters. In addition, the 
customized cluster structure can be stored in the cluster 
structure knowledge base 14 and retrieved at a later stage for 
processing new information. 

[0029] As described in the article cited above, ARAM is a 
family of neural network models that performs incremental 
supervised learning of recognition categories (pattern 
classes) and multidimensional maps of both binary and 
analog patterns. Referring to FIG. 2A, an ARAM system can 



be visualized as two overlapping Adaptive Resonance 
Theory (ART) modules consisting of two input fields 22 
and F^ 26 with a cluster field F 2 30. 

[0030] Each F 2 cluster node j is associated with an adap- 
tive template information vector w. a and corresponding 
adaptive template preference vector Wj b . Initially, aU cluster 
nodes are uncommitted and all weights are set equal to 1. 
After a cluster node is selected for encoding, it becomes 
committed. 

[0031] Fuzzy ARAM dynamics are determined by the 
choice parameters a a >0 and a b >0; the learning rates p a in [0, 
1] and p b in [0, 1]; the^yig jlance parameters p a 24 in [0, 1] 
and p b 28 in [0, 1]; and a contribution parameter y in [0, 1]. 
The choice parameters a* and a b control the bias towards 
choosing a F 2 cluster whose template information and pref- 
erence vectors have a larger norm or magnitude. The learn- 
ing rates fJ a and p b control how fast the template information 
and preference vectors Wj" and Wj b adapt to the input infor- 
mation and preference vectors A and B, respectively. The 
vi gilance parameters pa and p b determine the criteria for_a 
satisfactory mat ch between the input and the template in for- 
ma tion ana preterence vectors, respectively. Mhe contribu- 
tion parameter y controls the weighting of contribution from 
the information and preference vectors when selecting an F 2 
cluster. 

[0032] Referring to FIG. 2B, given an information vector 
A with an associated preference vector B, the system first 
searches for an F 2 cluster J encoding a template information 
vector Wj a and a template preference vector w. b paired 
therewith that are closest to the input information vector A 
and the input preference vector B, respectively, according to 
a similarity function. Specifically, for each F 2 cluster j, the 
information clustering engine calculates a similarity score 
based on the input information and preference vectors A and 
B, respectively, and the template information and preference 
vectors w^and Wj b , respectively. An example of a similarity 
function is given below as the category choice function, eqn. 
(3). The F 2 cluster that has the maximal similarity score is 
then selected and indexed at J. 

[0033] Referring to FIG. 2C, the information clustering 
engine performs template matching to verify that the tem- 
plate information vector w/ and the template preference 
vector Wj b of the selected category J match well with the 
input information vector A and the input preference vector 
B, respectively, according to another similarity function, e.g. 
eqn. (4) below. If so, the system performs template learning 
to modify the template vectors w/ and w/ of the F 2 cluster 
J to encode the input information and preference vectors A 
and B, respectively. Otherwise, the cluster is reset and the 
system repeats the process until a match is found. The 
detailed algorithm is given below. 

[0034] The ART modules used in ARAM may be of a type 
which categorizes binary patterns, analog patterns, or a 
combination of the two patterns (referred to as "fuzzy 
ART"), as is known in the art. Described below is a fuzzy 
ARAM model composed of two overlapping fuzzy ART 
modules. 

[0035] Referring to FIG. 3, the dynamics of the informa- 
tion clustering engine 20 is described as follows. Given a 
pair of F a a and F a input vectors A and B, for example, an 
information vector and preference vector, respectively, for 
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each F 2 node j, a category choice process 32 computes the 
choice function Tj as defined by 

T^&dre; H i *|/(a»+|K i *|>f(l- T )|BAci/c; H- j b |/(a b + ^ 

[0036] where, for vectors p and q, the fuzzy AND opera- 
tion is defined by (pˆ q)^^ ^i* anci tne mTm ^ 
defined by |p|=2ipi. Tlie system is said to make a choice 
when at most one F 2 node can become active. The choice is 
indexed at J by a select winner process 34 where Tj=max 
{T^ for all F 2 nodes j}. 

[0037] A template matching process 36 then checks if the 
selected cluster represents a good match. Specifically, a 
check 38 is performed to verify if the match functions, m/ 
and mj b , meet the vigilance criteria in their respective 
modules: 

mf-lA&citv; wS\/\A\*p* and mf-\B&drc; wf- 

\B&ciiv; wfltlBfep*. (4) 

[0038] Resonance occurs if both criteria are satisfied. 
Learning then ensues, as defined below. If any of the 
vigilance constraints is violated, mismatch reset 42 occurs in 
which the value of the choice function Tj is set to 0 for the 
duration of the input presentation. The search process 
repeats, selecting a new index J until resonance is achieved. 

[0039] Once the search ends, a template learning process 
40 updates the template information and preference vectors 
w/ and Wj b , respectively, according to the equations 

w^^V-Ww^W+piA&cire; w J « cl ' s >) (5) 

[0040] and 

w^^l-^w^+ffiBˆ w* oU1 ) (6) 

[0041] respectively. For efficient coding of noisy input 
sets, it is useful to set p a =p b =l when J is an uncommitted 
node, and then take p a <l and p b <l after the cluster node is 
committed. Fast learning corresponds to setting (3 a =p b «l for 
committed nodes. 

[0042] At the start of each input presentation, th e vigilance 
paramejg r p tt equals a haseljne > vigilance^ fj a . If a reset occurs 
in the category field F 2 , amatch tracking process 44 
increases p a until it is slightly larger than the match function 
m/. The search process then selects another F 2 node J under 
the revised vigilance criterion. 

[0043] Referring to FIG. 4, the personalization module 50 
works in conjunction with the information clustering engine 
20 to incorporate user preferences to modify the machine 
generated cluster structure. 

[0044] An exemplary parameter setting for the informa- 
tion clustering engine 20 is as follows: a°«a b =0.1, p fl =p b »l, 
p a =0.5, p a =l, and y=0 5. During automatic clustering, no 
user preference is given, and the information clustering 
engine 20 automatically generates a cluster structure, 
referred to as a machine -generated cluster structure: For 
each unit of information (<I>), a pair of vectors (A, B 0 ) is 
presented to the system, where A is the representation vector 
of and 

#0-1 for i-l, ... , IN. (7) 

[0045] Since |B 0 ˆ w^BqI equals 1, condition (6) 
reduces to 



[0046] Essentially, the system now operates like a pure 
clustering system that self-organizes the information based 
on similarities in the information vectors. The coarseness of 
the information groupings is controlled by the baseline 
vigilance parameter (p ft ). 

[0047] A create cluster module 52 allows the user to add 
a new information cluster into the system so that information 
can be organized according to such an information grouping. 
Through the user interface module 16, the user can input a 
pair of template information and preference vectors (w/, 
w/) which defines the key attributes of the information in 
the cluster together with a cluster label, if any. The resulting 
clusters reflect the user's preferred way of grouping infor- 
mation and can be used as the default slots for organizing 
information. 

[0048] A label cluster module 54 allows the user to assign 
labels to "mark" certain information groupings that are of 
particular interest (to the user) so that new information can 
be organized according to such information groupings. 
Through the user interface module 16, the user can assign a 
label to a cluster j by modifying the template preference 
vector Wj b to equal B k , where B k is a preference vector 
representing L^. Labels reflect the user's interpretation of the 
groupings. They are useful landmarks to the user in navi- 
gating the information database and locating old as well as 
new information. 

[0049] Using the label cluster module 54, the user is able 
to merge clusters implicitly by labeling them with the same 
labels. In this case, the merging is said to be a local one as 
it only affects the clusters that are labeled. To do a global 
merging, a merge cluster module 56 allows the user to 
combine two or more information groupings generated by 
the clustering process using a lower vigilance parameter 
value. Through the user interface module 16, the user can 
select one or more units of information in each of two 
different clusters as an indicative standard of similarity. As 
an example, A a and A 2 can be two information vectors 
representing two units of information. The revised baseline 
vigilance parameter p a would then be computed as 



p'-min pˆ A 2 \I\^X |A,Aa>e; A 2 l/jA 2 D. 



(9) 



(8) 



[0050] Using the new baseline vigilance parameter, A a and 
A2 will satisfy the match condition as stated in (4) and may 
be grouped into one cluster as a result of the relaxed 
similarity criteria. The effect of cluster merging is global, in 
the sense that the system now operates at a lower vigilance 
on the whole, grouping items together that it would other- 
wise distinguish. 

[0051] A split clus lexjaodu4e-4& allows the user to split an 
information gr oup into two or_m ore_ clusters bv_ind i cati ng_ 
th at"ceftain units of information_are sufficiently different to 
be grouped separa tely. T hrough the user interface module 
16rthe user caifselect two specific units of information, for 
example A 3 and A 2 , in a cluster and assign to them two 
different labels, for example L a and Lj, represented by Bj 
and B 2 , respectively. 

[0052] The updated pairs of information and preference 
vectors, (A lt BJ and (A 2 , B2), together with the rest of the 
vectors, are presented to the information clustering engine 
20 for re-clustering. Since B^B^ and p b =l, A 1 and A2 will 
be grouped into different clusters. In addition, the remaining 
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information vectors, originally in a single cluster, will be 
re-organized into one of the two new clusters based on their 
similarities to A 3 and A^ 

[0053] In another example, Aj and A 2 can be used to 
tighten the match condition for the entire information space. 
In this case, the baseline vigilance parameter p a would be 
recomputed as 

p'omaxflA^cuc; A 2 |/|A,|, \A y <kcirc; A 2 \j\\-fr*L, (12) 

[0054] where e is a small constant. 

[0055] By selective and/or repeated application of the 
create cluster, label cluster, merge cluster and split cluster 
functions of the personalization module 50, the user is able 
to create a labeled cluster structure which incorporates his 
own preferences. 

[0056] To preserve the labeled cluster structure for a fresh 
clustering session, a store cluster module 60 transfers the 
following information to the cluster structure knowledge 
base 14. 

[0057] 1. The baseline vigilance parameter (p a ). 

[0058] 2. For each labeled cluster j, the template infor- 
mation vector Wj* and the associated template prefer- 
ence vector w^. 

[0059] The stored clusters and cluster structure can be 
retrieved at a later stage from the knowledge base 14 using 
a retrieve cluster module 62 to initialize the architecture of 
the information clustering engine 20, that is, the retrieved 
cluster structure is used as the initial structure of the infor- 
mation clustering engine. Based on the initialized cluster 
structure, new information can be organized according to the 
user's preferences stored over the previous sessions. The 
cluster structure may also be modified by further personal- 
ization. 

[0060] An embodiment of the disclosed invention is per- 
sonalized document navigation wherein the user is allowed 
to customize the document navigation space, i.e. the orga- 
nized collection of information available to the user, with 
respect to his or her interpretation and preferences. 

[0061] Another embodiment of the invention is a drag- 
and-draw approach to building a categorization system. 
Information is first automatically clustered into natural 
groupings based on similarities in content. By modifying the 
machine-generated groupings, the user can define her pre- 
ferred groupings using the personalization functions as 
described herein. A classification system can be created in an 
intuitive and interactive manner without the need for 
example annotation (i.e., labeling information) or knowl- 
edge specification (e.g., written classification rules). 

[0062] Yet another embodiment is detection of new infor- 
mation and trend analysis. The user defines his know-how 
and interpretation of the environment in terms of how he 
wants the information to be organized. New information is 
supplied periodically to the information clustering system. 
New information that falls within the user-defined cluster 
structure corresponds to familiar themes of information. Any 
new information that falls outside of the defined cluster 
structure represents new themes which are potentially inter- 
esting to the user. Repeated personalization of the informa- 
tion in the information database, i.e. creating, labeling, 
merging, splitting, and storing clusters and the resulting 



labeled cluster structure, helps the user to identify informa- 
tion that is new with respect to his experience and to analyze 
unexpected trends. 

[0063] Various preferred embodiments of the invention 
have now been described. While these embodiments have 
been set forth by way of example, various other embodi- 
ments and modifications will be apparent to those skilled in 
the art. Accordingly, it should be understood that the inven- 
tion is not limited to such embodiments, but encompasses all 
that which is described in the following claims. 

What is claimed is 

1. A method of organizing information into a plurality of 
classes or clusters with a user-configurable information 
clustering system comprising: 

a) grouping units of information into clusters based on 
similarities to create a cluster structure; and 

b) personalizing said cluster structure according to user 
knowledge and preferences. 

2. The method according to claim 1 wherein said grouping 
units of information into clusters is carried out automatically 
to create a machine-generated cluster structure. 

3. The method according to claim 1 wherein said person- 
alizing comprises creating at least one new information 
cluster. 

4. The method according to claim 3 wherein said person- 
alizing further comprises labeling each information cluster. 

5. The method according to claim 4 wherein said person- 
alizing further comprises merging information clusters. 

6. The method according to claim 5 wherein said person- 
alizing further comprises splitting at least one information 
cluster. 

7. The method according to claim 6 wherein said person- 
alizing further comprises storing said cluster structure in a 
knowledge base. 

8. The method according to claim 1 wherein said person- 
alizing comprises labeling each information cluster. 

9. The method according to claim 1 wherein said person- 
alizing comprises merging information clusters. 

10. The method according to claim 1 wherein said per- 
sonalizing comprises splitting at least one information clus- 
ter. 

11. The method according to claim 1 wherein said per- 
sonalizing comprises storing said cluster structure in a 
knowledge base. 

12. The method according to claim 1 wherein said infor- 
mation comprises text, image, audio, video or any combi- 
nation thereof. 

13. The method according to claim 1 wherein said user- 
configurable information clustering system comprises an 
adaptive resonance associative map. 

14. The method according to claim 1 wherein said user- 
configurable information clustering system incorporates 
user knowledge and preferences for information clustering. 

15. The method according to claim 1 wherein said user- 
configurable information clustering system further com- 
prises a user interface. 

16. The method according to claim 1 wherein each of said 
units of information is represented by an information vector. 

17. The method according to claim 1 wherein a user- 
preferred information grouping is represented by a prefer- 
ence vector. 
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18. The method according to claim 1 wherein said units of 
information are grouped into classes or clusters based on a 
similarity function. 

19. The method according to claim 18 wherein said 
classes or clusters have a coarseness which is controlled by 
a baseline vigilance parameter. 

20. The method according to claim 1 further comprising 
indication by a user of a preference for a lower baseline 
vigilance parameter by selecting at least one unit of infor- 
mation from each of at least two clusters wherein the 
selected units of information are deemed by the user to be 
similar to each other 

21. The method according to claim 1 further comprising 
indication by a user of a preference for a higher baseline 
vigilance parameter by selecting at least two units of infor- 
mation in a cluster, wherein said units of information are 
deemed by the user to be dissimilar to each other. 

22. The method according to claim 1 further comprising 
retrieving said cluster structure to initialize said user-con- 
figurable information clustering system prior to clustering 
new information. 

23. A method of building information classification sys- 
tems comprising: 

a) grouping information vectors into clusters based on 
similarities; 

b) assigning a category label to each cluster; 

c) merging information clusters; 

d) splitting information clusters; and 

e) storing labeled clusters as the defined categories of a 
classification system. 

24. The method according to claim 23 wherein said 
information comprises text, image, audio, video or any 
combination thereof. 

25. A method for new information detection and trend 
analysis with a user-configurable information clustering 
system comprising: 

a) grouping information vectors into clusters based on 
similarities; 

b) creating a labeled cluster structure by assigning a 
category label to each cluster, merging information 
clusters and splitting information clusters according to 
a user's preferences; 

c) storing said labeled cluster structure, wherein said 
cluster structure defines the user's knowledge; 

d) retrieving said cluster structure; 

e) initializing the information clustering system using said 
retrieved cluster structure; and 

f) analyzing new clusters, wherein said clusters are 
grouped according to the user's preferences. 

26. The method according to claim 25 wherein said 
information comprises text, image, audio, video or any 
combination thereof. 

27. A user-configurable information clustering system 
comprising: 

a) an information clustering engine for clustering units of 
information based on similarities to create a cluster 
structure; 



b) a personalization module for personalizing said cluster 
structure according to user knowledge and preferences; 

c) a user interface; and 

d) a knowledge base for storing said cluster structure. 

28. The system according to claim 27 wherein said 
information clustering engine automatically clusters infor- 
mation to create a machine-generated cluster structure. 

29. The system according to claim 27 wherein said 
personalization module comprises means for creating at 
least one new information cluster. 

30. The system according to claim 29 wherein said 
personalization module further comprises means for label- 
ing each information cluster. 

31. The system according to claim 30 wherein said 
personalization module further comprises means for merg- 
ing information clusters. 

32. The system according to claim 31 wherein said 
personalization module further comprises means for split- 
ting at least one information cluster. 

33. The system according to claim 32 wherein said 
personalization module further comprises means for storing 
the cluster structure in said knowledge base. 

34. The system according to claim 33 wherein said 
personalization module further comprises means for retriev- 
ing the cluster structure from said knowledge base. 

35. The system according to claim 27 wherein said 
personalization module comprises means for labeling each 
information cluster. 

36. The system according to claim 27 wherein said 
personalization module comprises means for merging infor- 
mation clusters. 

37. The system according to claim 27 wherein said 
personalization module comprises means for splitting at 
least one information cluster. 

38. The system according to claim 27 wherein said 
personalization module comprises means for storing the 
cluster structure in said knowledge base. 

39. The system according to claim 27 wherein said 
personalization module comprises means for retrieving the 
cluster structure from said knowledge base. 

40. The system according to claim 27 wherein said 
information comprises text, image, audio, video or any 
combination thereof. 

41. The system according to claim 27 wherein user 
knowledge and preferences are incorporated in information 
clustering. 

42. The system according to claim 27 wherein said 
information clustering engine comprises an adaptive reso- 
nance associative map. 

43. The system according to claim 27 wherein said user 
interface permits graphical visualization of said information 
clusters. 

44. The system according to claim 27 wherein each of said 
units of information is represented by an information vector. 

45. The system according to claim 27 wherein a user- 
preferred information grouping is represented by a prefer- 
ence vector. 

46. The system according to claim 27 wherein said units 
of information are grouped into classes or clusters based on 
a similarity function. 

47. The system according to claim 46 wherein said classes 
or clusters have a coarseness which is controlled by a 
baseline vigilance parameter. 
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48. The system according to claim 27 wherein said 
personalization module permits indication by a user of a 
preference for a lower baseline vigilance parameter by 
selecting at least one unit of information from each of at 
least two clusters wherein said selected units of information 
are deemed by the user to be similar to each other. 

49. The system according to claim 27 wherein said 
personalization module permits indication by a user of a 
preference for a higher baseline vigilance parameter by 
selecting at least two units of information in a cluster, 
wherein said units of information are deemed by the user to 
be dissimilar to each other. 

50. An information classification system comprising: 

a) means for grouping information vectors based on 
similarities; 

b) means for creating a plurality of information clusters; 

c) means for labeling each information cluster; 

d) means for merging information clusters; 

e) means for splitting at least one information cluster; and 

f) means for storing labeled clusters as the defined cat- 
egories of a classification system. 



51. The system according to claim 50 wherein the infor- 
mation comprises text, image, audio, video or any combi- 
nation thereof. 

52. A system for new information detection and trend 
analysis comprising: 

a) means for grouping information vectors into clusters 
based on similarities; 

b) means for creating a plurality of information clusters; 

c) means for assigning a category label to a cluster; 

d) means for merging information clusters; 

e) means for splitting at least one information cluster; 

f) means for storing labeled clusters wherein said clusters 
define a user's knowledge of said information. 

g) means for retrieving said clusters to permit analysis of 
new clusters, wherein said new clusters are grouped 
according to the user's preferences. 

53. The system according to claim 52 wherein the infor- 
mation comprises text, image, audio, video or any combi- 
nation thereof. 

* * * * * 
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